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Sixth Quarterly Report 

STUDY ON SPECTRAL/RADIOMETRIC CHARACTERISTICS 
OF THE THEMATIC MAPPER FOR LAND USE APPLICATIONS 


1. OBJECTIVE 

The objective of this investigation is to quantify the performance 
of the TM as manifested by the quality of its image data, in order to 
suggest improvements in data production and to assess the effects of 
the data quality on its utility for land resources applications. Three 
categories of this analysis are: a) radiometric effects, b) spatial 

effects and c) geometric effects, with emphasis on radiometric effects. 

2. TASKS 

Four tasks have been established to address the above objective. 

The first three are to study radiometric performance, spatial perfor- 
mance and geometric performance, respectively, while the fourth is to 
study spectral characteristics. In keeping with the identified objective, 
the radiometric performance study is the major task. 

3. STATUS AND TECHNICAL PROGRESS 

During this sixth quarterly reporting period, efforts were concen- 
trated on developing a measure of the information content of multi spectral 
data, such as Thematic Mapper (TM) and Multi spectral Scanner (MSS) data, 
and then comparing results obtained upon applying the measure to simul- 
taneous data from TM and MSS. 

3.1 PROBLEMS 


None, 

3.2 ACCOMPLISHMENTS 

An information-theoretic measure of multi spectral information con- 
tent was developed and applied to simultaneous Landsat TM and MSS data 
sets and preliminary observations and comparisons were made. 

3.2.1 OBJECTIVES 

With multispectral data sets from remote sensing systems, questions 
arise as to the relative merits of individual and groups of spectral 
bands and transformed spectral variables. Classification-based measures 
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are frequently used for such comparisons, as are variance-based mea- 
sures. The objectives of the work reported here were to develop a class 
independent and non-parametric measure and to apply it to Landsat TM and 
MSS data sets; the measure developed is based on information theoretic 
principles. 

3.2.2 APPROACH 

A communications-theory approach is taken to analyze the dispersion 
and concentration of signal values in various data spaces, irrespective 
of any specific class memberships. Entropy, as defined by Shannon, is 
used as the basic measure of information. The process of selecting a 
subset of bands is viewed as the transmission of data through a lossy 
communication channel , and the mutual information between the input and 
output is the measure of information transfer, i.e., the information 
represented by the subset. 

The new measure was applied to Landsat Mul tispectral Scanner (MSS) 
and six-band Thematic Mapper (TM) data of two types. These are simu- 
lated data values derived from field-measured reflectance spectra of 
agricultural crops and soils and -m atmospheric model, and actual 
Landsat-4 MSS and TM data acquired simultaneously from an agricultural 
scene in North Carolina. These data were used in a prior comparison 
we made of the spatial and spectral characteristics of Landsat TM and 
MSS data [1 ,2]. 

Several different comparisons of information content are made. 

These include comparison of TM and MSS system-design information capa- 
cities, comparison of the data-space volumes spanned by the agricultural 
data in the spaces defined by original bands and by transformed spec- 
tral (Tasseled-Cap) variables, comparison of the agricultural informa- 
tion content of original bands to that of transformed variables, and 
comparison of the agricultural information content of TM to that of MSS. 

3.2.3 INFORMATION MEASURE DERIVATION 

3.2.3. 1 Basic Concepts . Shannon defined self information, 
as a measure of the information associated with knowing the occurrence 
of a signal state x- which occurs with probability P(x^); 

I(x. ) = log 2 (pXxTj') = " log 2 P ( x i) (bits) (1) 


The more rare the event, the greater is one's uncertainty about when 
it will occur and, consequently, the greater is the information conveyed 
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when it is observed. Entropy, given the symbol H, is the value of self 
information when averaged over all N possible states of x: 


H(x) ~ ^ 1 Q9g p( x . 'j (2) 

With two variables, the use of joint and conditional probabilities 
is necessary: 



H(x,y) = H(x) + H(y|x) 

(3a) 

or 

H(x,y) = H(y) + H(x|y) 

(3b) 

since 

P(x,y) = P(x)P(y |x) 

(4a) 

or 

P(x,y) = P(y)P(x|y) 

(4b) 


In computing the conditional entropy, the weighting assigned to each 
information term is the joint probability of the states involved, i.e., 
for example, 

N x N y 

H(x|y) = h i =] P(x i’ y j ) l09 2 Pl^ry < 5 > 

If we consider x to be the input to a communication channel and 
y to be the output, we can define the mutual information transferred 
between them, i.e., I^(x;y), as 


i M (x;y) = H(x) - H(x|y) 


( 6 ) 


In words, the mutual information exchanged is the difference between 
H(x), the information content of the input, and H(x|y), the uncertainty 
about x when we are given the output y. When the total information is 
transferred, H(x|y) = 0 and I M (x;y) = H(x). At the other extreme, when y 
does not contain any information relatable to x, H(x|y)=H(x) and there- 
fore Ij V |(x;y) = 0, i.e., the mutual information is zero. 

A convenient measure of channel (signal transformation) efficiency 
is the relative entropy or the ratio of mutual information to the total 
information of the input: 


_ i M (x;y) 

\ ' h(x) — 


H _ H(x) - H(x|_v) _ 
M r fl (x) 


H(xiy) 

HUT 


(7a) 

(7b) 
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3. 2. 3. 2 Multi spectral Extension . The above concepts can be 
extended to multispectra 1 va nab les by letting the variables x and y 
become multidimensional vectors X and Y, with X = (X-j ,X 2 >. • . ,X^ ) and 

Y s ( Y i » Y 2 »* • • > Y jyj )• Usually, N y <.N x . The transformation achieved by 

the communication channel is used here in a general sense, to represent 
both simple selections of spectral band subsets and more complex trans- 
formations, such as the Tassel ed-Cap Transformation. 

The entropy of the input {X} becomes a function of the frequency 
with which individual signal-space cells or states are populated. Since 
the Thematic Mapper has six reflective bands, the equations are presented 
here in terms of six variables, although they should be adaptable to any 
number. The total information is: 


’2 N 3 N 4 


N(X) - I I I I I I P(^i ,• 9 X 2 i >Xsk »^4i ’^5m 5 ^6n^ 

i=l j=l k-1 1=1 m=l n=l 11 JK 41 bm bn 


log 2 Fl Xli > X 2j,X 3k » x 41 »X 5m ,X 6n T 

where P(X-| i ,X 3k ,X 41 ,X 5m ,X 6n^ 1S f rec l uenc y which 
state X( ijklmn) is populated, 

and N is the number of populated levels of variable X . 

r r 

To shorten subsequent equations, abbreviated notation will at times 
be used, e.g. , 


(8a) 


h ( x ) = !•••! 

ijklmn 


P x (ijklmn) log 2 


1 

P^( ijklmn) 


The total number of possible states or cells, 


'cells ' N 1 N 2 N 3 N 4 N 5 N 6 


(8b) 


can be very large, but the vast majority will not be populated. From 
a calculational standpoint, 


P x ( ijklmn) 


C i jklmn 


C.. M 

ijklmn 


'ijklmn 


N 


obs 


(9) 
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where C^-j^ is the count of occurrences in the cell having 
Level i in X-j , Level j in X 2 , etc. 

and C n'.inmn sM nhc is the total number of observations 

ijklmn 1Jk1nin obs 

in the data set being analyzed. 


Additional insight into the meaning of H(X) and mutual information 
and their calculation comes from a different version of Equation 8: 


H ( X ) = 


lv-l 

1J 1,n " Ijklmn ijklran 


1--1 c 

ijklmn ijklmn 


C_. 


ijklmn 


1 


L ijklmn Cijk1mn 


1--1 

ijklmn 


ijklmn ^ og 2 ^ c ijklmnJ " C ijklmn ^ og 2 C ijklmn 


= /_J 


\ N obs) ijklmn ijklmn 1og 2 N obs ' (n^ ) |jki| n ijklmn 1og 2 c ijklnm 


H(X)= log 2 N obs 


Information 
if each 
observation 
were in a 
unique cell 



Information loss due to clustering 
of the observations into a subset 
of cells 


( 10 ) 


The entropy of X is expressed in Equation (10) as the difference between 
two terms. The first, l^^obs’ is maxilTlum possible information 

associated with the given number of observations, i.e., the information 
that would be present if each observation were unique and occupied a 
unique cell in the signal space. The second term represents the infor- 
mation that is lost by any clustering of observations into a subset of 
cells. 


Through use of conditional probabilities such as: 

P x (ijklmn) = p ( x ii) p ( x 2jl x ii) p ^ x 3kl x ii ,X 2j ^ ’ * ' P ^ X 6n I X 1 i ,X 2j X 3k >X 41 
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we can have a variety of expression}) for H ( X ) : 

H(X) = H(X 1 )+H(X 2 |X 1 ) + H(X 3 |X 1 ,X 2 )+... +H(X 6 |X 1 ,X 2 ,X 3 ,X 4 ,X 5 ) (11a) 

H(X) = H(X 6 )+H(X 2 |X 6 ) + H(X 3 |X 2 ,X 6 ) + ...+H(X 1 |X 2 ,X 3 , X 4 ,X 5 ,X 6 ) (lib) 

etc. 


3. 2. 3,3 Spectral Band Subsetting . The selection of subsets of 
spectral bands is a special case of the mutual information expression, 

I m (X;Y) = H(X) - H ( X | Y) 

where Y now is a subset X' of the X variables, so 
I m (XjX') = H ( X ) - H (X | X * ) 

Whenever a variable, say Xp, is retained, its conditional probability 

term becomes unity, its contribution to H (X | X * ) is reduced to zero, and 
its information content is retained as mutual information. Whenever a 
variable, say X q , is eliminated, there is a loss of mutual information. 

This loss is represented by the conditional entropy term through all 
conditional probability components in which X^ occurs on the left-hand 

side of the conditional probability indicator line but not on the right- 
hand hand (or given) side. The family of entropy relationships illus- 
trated by Equations (11a) and (lib) help define the required calculations. 


Single-Band Subsets . The mutual information represented by single- 
band subsets is: 



where p is the band selected and a is the corresponding subscript which 
indicates the level. This term can be computed from a histogram of 
signal levels from the band of interest. Alternatively, it can be 
expressed in terms of the total signal space represented by the data set: 


= l\yl P x ( ijklnin) 1 o9 2P7T~T 

p ijklmn v pa' 
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where P x (1jklnm) = 


and, for example, 


h N 6 

l:.yl c i 


!n,“ Ijklmn 

p( x 7i ) = ^ ■ 1 - llln Hobs 

Expanding this expression, for X, as an example, we have: 


N 1 N 6 


- (£) U W.) 




ijklmn/ (14b) 


Note the similarity between Equations (14) and (12), the only difference 
being in the second logarithmic term on the right-hand side of the equa- 
tion. In Equation (12) this term involves the count in a single cell, 
while in Equation t (14) it is the sum of counts in all cells that have a 
given level, i.e,, the counts are summed over all excluded variables. 

The pattern holds for all other combinations of variables, as shown next 
for five-band subsets. 

Five-Band Subsets . Choosing a subset of five from six bands is the 
same as choos Tng to e 1 i mi n at e the sixth. The conditional entropy in the 
case of eliminating Band X-j is: 


H(X|X 2 ,X 3 ,X 4 ,X 5 ,X 6 ) « P x (ijklmn) log 2 


"1 

/ i \ 

= \N ^i iklmn ^°^2 C 

\ N obs 'ijklmn 1JK,mn d S’jklmn 


05a) 


05b) 


) ijklmn Cijkllral l09 2 (ji C ijklmn)- ( n^s) ijklmn C f J og 2 C i jkli 


(15c) 
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Since 


» ♦ * » »Xg) a H(X ) ** H(X | Xg » . . ♦ »Xg) 
we have, from Equations (10) and (15c), 

/ 1 \ N] N(i f N] 

I H (X;X 2 ,X 3 ,)( 4> X g ,X 6 ) = log 2 N obs - \N^T J c tjk1mn 1 ° s 2\^ 1 | 1 C 1jkliim 

(16a) 

A N 6 / Ni \ /N, 

= io9 2 N obs •te)feivV i ^i m v 1092 Ui Cijkl 

(16b) 


Again, the form is similar to that of Equations (12) and (14), with the 
summation in the second logarithmic term being over the excluded variable. 
The pattern for subsets of two, three, and four bands should now be clear 
as well. 


3.2.4 PRELIMINARY RESULTS AND DISCUSSION 

Figure 1 presents information measures for two different quantities, 
as a function of the number of data variables. First, the system-design 
capacities of the Landsat-4 TM and MSS are presented, in terms of the 
number of bits transmitted to the ground and/or recorded on computer- 
compatible tapes (CCT's). For TM, the number of bits recorded on CCT's 
is the same as that transmitted (8 bits/channel ). For MSS, however, 
the six-bit telemetered data are expanded to seven bits on the CCT's, 
with only an apparent gain of Information. Nevertheless, most subsequent 
comparisons involving MSS will use seven-bit data since that is the form 
in which we have them. Second, the data-space volumes spanned by TM and 
MSS data from the North Carolina agricultural scene are displayed. These 
numbers were computed by summing the bit-equivalent of the data-value 
range (max-min + 1) in each band being considered. 

The greater information potential of the TM system design, as com- 
pared to the MSS system, is quantified as 48 vs. 24 bits in telemetered 
data. Upon comparing the fractions of their total data-space volumes 
that are spanned by data from the agricultural scene, one observes that 
the TM data fall nine bits short of capacity while the MSS data fall two 
to six bits short, depending on which curve is used as the reference. 
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Figure 2 compares the data-space volumes spanned by original and 
transformed versions of signals from the agricultural scene. The trans- 
formations used here are the linear-combination Tasseled-Cap (TASCAP) 
transformations of MSS [3] and TM [4] data, whose principal variables 
are Brightness and Greenness. It appears that a bit-rate reduction of 
about 3 bits/pixel could be achieved for this agricultural scene, with- 
out loss of information (See discussions of Figures 3 and 4), by trans- 
mitting values from the transformed variables instead of from the 
original bands. 

While the data volumes spanned are quite large, the information- 
measure values are much smaller, less than 14 bits total (constrained 
by sample size) for these agricultural data, as shown in Figure 3. 

This figure compares the agricultural information content of original 
and TASCAP variables from TM and MSS for the North Carolina scene. In 
each case, the best subset of each size was used. The mutual informa- 
tion measure which is plotted reflects the actual data-cell patterns 
into which data from the scene were concentrated and dispersed. . For 
both data sets, relatively little information is gained by the inclu- 
sion of more than three variables. The information content of TM data 
is seen to be from one to more than two bits greater than that of MSS 
data from the agricultural scene. 

Figure 4 illustrates, for the simulated MSS data set, the fact that 
the information intents of original band values and two types of trans- 
formed variables *^re essentially identical. In addition to TASCAP vari- 
ables, principal -component variables were extracted and their information 
content measured. This equality is in keeping with results of theoreti- 
cal analyses. 

Mutual Information values for the best and worst band subsets of 
each size are presented in Figure 5, to illustrate the range of informa- 
tion conveyed by various subsets of the data, The differences are 
greatest among pairs of variables for both TM and MSS. Figure 6 is a 
similar comparison for TASCAP variables. In this case, we find an even 
greater disparity between best and worst, due to the decreased informa- 
tion content of the last TASCAP variables. 

The above comparisons have been made primarily using the sets of 
real TM and MSS data from the agricultural scene. Also analyzed was a 
simulated data set generated from field-measured reflectance spectra 
of agricultural crops and soils. Figures 7 and 8 present comparisons 
of data volumes and information contents of the real and simulated TM 
and MSS data sets, respectively. Data volumes of the simulated sets 
are slightly higher and mutual information values slightly lower than 
for the real data. Figures 9 and 10 present the information ranges 
spanned by the best and worst subsets for Band and TASCAP variables, 
respectively. The trends are very similar to those observed for real 
data (Figures 5 and 6). 
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Results of analyzing the in Format ion -measure values for the various 
data subsets revealed several interesting trends and produced interest- 
ing comparisons with variance-based measures, not all being consistent 
with the author's initial expectations, When the best subsets were 
chosen, the information level appeared to be more a function of the 
number of variables than of the type of variables (i.e., original vs. 
transformed). 

With TM, little difference was found among information measures for 
the three pairings of TM Brightness, Greenness, and Third-Component, with 
the TM Brightness/Third-Component vs. TM Greenness/Third-Component com- 
parison being 11,1 vs. 11.5 bits for the real data set, as shown in 
Figure 11. A greater difference among TM band pairs is evident in 
Figure 12. While the proportion of variance explained by the first two 
principal components of MSS data was essentially unity, the information 
measure showed a lower percentage of the total information was in these 
two components. The third MSS TASCAP variable (Yellowness) also showed 
a greater information increment than we have come to expect based on 
experience in comparing eigenvalues and viewing scatter diagrams of MSS 
Greenness vs. Yellowness for agricultural data (likely a result of the 
several -count range of values in the Yellowness variable, i . e . , of the 
thickness of the principal Brightness-Greenness plane). _ However, the 
information measure for the MSS Greenness-Yel lowness pair was substan- 
tially lower than for the MSS Brightness-Greenness pair (7.5 vs. 9.5 
bits for the real data set), which is consistent with those prior expec- 
tations. The above results indicate a greater data dispersion (and infor- 
mation potential) for the Third Component of TM than of MSS. Also, cor- 
relations for TM of -0.69 and 0.36 were noted between Third-Component 
values and Brightness and Greenness values, respectively, whereas they 
were uncorrelated for MSS, This is consistent with another examination 
of this agricultural data set which revealed a somewhat planar TM dis- 
persion pattern that is not aligned with any TM TASCAP axis (although 
the use of the TASCAP coordinates can still markedly assist interpre- 
tation and analysis of the data values). 

These results and observations are considered to be preliminary 
in nature and the reader is urged not to treat them as final, especially 
since the possibility for data-set dependence exists and only one real 
and one simulated data set were analyzed here. The information measure 
employed measures the number of data cells occupied and their populations, 
independent of thnir class membership, but consequently is dependent on 
the population composition of the samples that comprise the data sets. 

Thus, all results must be interpreted in light of the data populations 
analyzed. In the simulated data sets, for instance, vegetation samples 
vastly outnumbered bare-soil samples. It is noted that the measure alsc 
is independent of the noise levels in the various bands and of the ease 
and consistency of interpretation of the spectral variables (an advantage 
ascribed to TASCAP variables), which are other factors which should be con- 
sidered. The presence of noise adds variance and could make the apparent 
information content greater than the true information content of ideal 
signals. 
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3,3 SIGNIFICANT RESULTS 

An information-theoretic measure of the information content of 
various subsets of multispectral variables wa'* developed and applied to 
a real and a simulated set of simultaneous Tf. and MSS data from agri- 
cultural crops and soils. Preliminary observations and comparisons are 
made. 


.‘,.4 PUBLICATIONS AND PRESENTATIONS 

A paper describing results of this investigation was invited for 
presentation and publication at the 1984 Purdue/LARS Symposium on 
Machine Processing of Remotely Sensed Data, June 12-14, 1984. Entitled 
"Thematic Mapper Radiometric Characterization" , and co-authored by 
William A. Mali la and Michael D. Metzler, it will be presented in a 
session on TM Data Quality Analysis which is to be chaired by W. Malila. 

3.5 RECOMMENDATIONS 
None. 

3.6 FUNDS EXPENDED 

A total of approximately was expended during the three 

months November 1983 through February 1984. The cumulative spending 
through February represents approximately 62% of the total contract 
award. Expenditures during the period 1-20 March 1984 are not included 
in this percentage value. 
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FIG. 1 COMPARISON OF TM AND MSS 
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FIG. 2 COMPARISON OF DATA VOLUMES: 
REAL AGRICULTURAL DATA 



NUMBER OF VARIABLES 


14 


INFORMATION MEASURE (BITS) 


ORIGINAL PAGE IS 
OF POOR QUALITY 


INFRARED AND OPTICS DIV/ISION 


COMPARISON OF AGRICULTURA 
INFORMATION CONTENT 
REAL DATA 


TM Bands 
TM TASCAP 
MSS Bands 
MSS TASCAP 


NUMBER OF VARIABLES 






±INr UK I IH I _L U In I ItHUUKt IBJ. I UJ 


ORIGINAL PAGE B 
OF POOR QUALITY 


INFRARED AND OPTICS DIVISION 


INFORMATION COMPARISON OF 
nss BAND, TRSCRP, AND 

PRINCIPAL-COMPONENT VARIABLES 
(SIMULATED AG. DATA) 


BANDS 

— Q TASCAP 


PRIN. COMP. 


NUMBER OF VRRIRBLES 





ORIGINAL P AGE TU 
OF POOR QUALITY 


INFRARED AND OPTICS DIVISION 


FIG. 5 RANGE OF INFORMATION IN 
SUBSETS OF BANDS: 

REAL AGR. DATA 



TH Range 


riSS Range 



2 3 4 5 

NUMBER OF VARIABLES 


17 



ORIGINAL PAGE 13 
OF POOR QUALITY 


INFRARED AND OPTICS DIVISION 


RANGE OF INFORMATION IN 
SUBSETS OF TASCAP VARIABLES: 

REAL AGR. DATA 









TM Range 


HSS Range 


[♦‘♦‘ ♦"♦a 


NUMBER OF VARIABLES 





INFORMATION MEASURE (BITS) 


ORIGINAL FrT, S3 
OF POOH Q”AU'5Y 





INFRARED AND OPTICS DIVISION 


FIG. 7 COMPARISON OF REAL AND 



NUMBER OF VARIABLES 


19 



INFORMATION NERSURE (BITS) 


/ERJM 


ORIGINAL PAGE IS 
OF POOR QUALITY 


INFRARED AND OPTICS DIVISION 


FIG. 8 COnPflRISON OF RERL RND 



0 1 2 3 4 5 6 


NUMBER OF VRRIRBLES 


20 


INFORMATION MEASURE (BITS) 


* 



ORIGINAL PAGE IS 
OF POOR QUALITY 


IN'RARtD AND OPTICS DIVISION 


FIG. 9 RANGE OF INFORMATION IN 
SUBSETS OF BANDS: 



i 


INFORMATION MEASURE (BITS) 



ORIGINAL PAGE 13 
OF POOR QUALITY 

INFRARED AND OPTICS DIVISION 


FIG. 10 RANGE OF INFORMATION IN 

SUBSETS OF TASCAP VARIABLES 
1 SIMULATED AGR. DATA 


o 

OJ 


LD 



TM Range 


MSS Range 



C3 * » * * H 1 

0 1 2 3 4 5 6 

NUMBER OF VARIABLES 


FIG. 11 TM-I1SS TRSCRP COFIPfiRISONS: 


ORIGINAL PAGE 13 
OF POOR QUALITY 



23 


FIG. 12 TM-nSS BfiND C0I1PRRIS0NS: 



ORIGINAL PAGE 19 
OF POOR QUALITY 


INFRARED AND OPTICS DIVISION 



24 


